Mon. Not. R. Astron. Soc. 000, 000-000 (0000) Printed 12 January 2010 (MN KTeX style file v2.2) 



Improving PSF modelling for weak gravitational lensing using new 
methods in model selection 



o ' 

^ Barnaby Rowe* 

C ^ ^ Institut d'Astrophvsique de Paris, UMR7095 CNRS, Universite Pierre et Marie Curie - Paris 6, 98 bis, Boulevard Arago, 75014 Paris, France 

(N ■ 

a 



(N 



o 

u 

^' 

o 

^\ 

(N 
> 

in 
o 

m 

^' 

o 
a^ 
o 



12 January 2010 



1 INTRODUCTION 



The study of weak gravitational lensing (see, e.g., ISchneide3 
l2006l for a recent review) promises much as a means of de- 
tecting and measuring massive structure on cosmological scales. 
Through its sensitivity to all lensing mass whether baryonic or 
exotic, weak lensing potentially provides direc t measurement of 
the cosmological matter power spectrum (e.g. IFu et al.M200 ^, a 
mean s of relating this power to vis i ble structure (e.g. | Hoeks tra et al.l 
l2005l : iMandelbau m et al.1 IZOOd iTian. Hoekstra & Zhad l2009h . 
and the mapping of individual clusters and super-cluster s (e.g., 
iMassev et allzOOTal : iHoekstrdlzOOTl : iHevmans et ai]|2008h . How- 
ever, extracting an unbiased and uncontaminated shear signal from 
real telescope images in current and upcoming surveys represents 
an unprecedented technical challenge. 

Much work has gone into testing and refining the many 
competing weak lensing data processing pipelines, using simu- 
latio ns of survey imagine data wi th known input lensing sig- 
nals teevmans et alj20()6l : lMassev e t al. 2007b; Bridle et al. 2008). 
This simulation work has understandably concentrated upon the 
problem of shear measurement from noisy galaxy images, af- 
ter their convolution with an anisotropic telescope point spread 
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ABSTRACT 

A simple theoretical framework for the description and interpretation of spatially correlated 
modelling residuals is presented, and the resulting tools are found to provide a useful aid to 
model selection in the context of weak gravitational lensing. The description is focused upon 
the specific problem of modelling the spatial variation of a telescope point spread function 
(PSF) across the instrument field of view, a crucial stage in lensing data analysis, but the tech- 
nique may be used to rank competing models wherever data are described empirically. As such 
it may, with further development, provide useful extra information when used in combination 
with existing model selection techniques such as the Akaike and Bayesian Information Crite- 
ria, or the Bayesian evidence. Two independent diagnostic correlation functions are described 
and the interpretation of these functions demonstrated using a simulated PSF anisotropy field. 
The efficacy of these diagnostic functions as an aid to the correct choice of empirical model is 
then demonstrated by analyzing results for a suite of Monte Carlo simulations of random PSF 
fields with varying degrees of spatial structure, and it is shown how the diagnostic functions 
can be related to requirements for precision cosmic shear measurement. The limitations of 
the technique, and opportunities for improvements and applications to fields other than weak 
gravitational lensing, are discussed. 

Key words: gravitational lensing - methods: data analysis - methods: statistical - cosmology: 
observations - cosmology: large-scale structure of the Universe. 



function (PSF) and subsequent pixelization onto CCD arrays. 
However, there are other important stages in any weak lens- 
ing analysis that have been subjected to less scrutiny, such 
as: stacking and dithering of exposures to create science im- 
ages, selection of stars for PSF modelling, colour dependence 
of instrument PSFs when using broad band filters, and the ac- 
curate modelling of the s patial variation of the PSF across 
the telescope field of view. Pau lin-Henriksson et alj ( 12008 ) and 
IPaulin-Herrriksson. Refregier & Amara (2009) have recently con- 
ducted work into the amount of information required for character- 
izing typical PSFs at a given point, described in terms of 'complex- 
ity' and 'sparsity', but the amount of information about the overall 
spatial variation across the sky is less well-addressed by the liter- 
ature. Moreover, this spatial variation in the PSF is purposefully 
ignored by curre nt shear measurement testing simulations (e.g., 
iBridle et al.ll2008h . reserved as a separate issue. 

The work in this paper is motivated towards finding ways 
to improve this aspect of PSF modelling, which traditionally 
takes the form of fitting polynomial surfaces to quantities that 
represent important properties of the PSF. In the KSB method 
dKaiser. Squires & BroadhursJ 1 19951 : iHoekstra etal] 1 19981) these 
quantities are frequently the two components of stellar anisotropy 
correction, estimated using the measured ellipticities of stars in the 
field of view. For techniques that model PSFs with shapelet ba- 
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sis functions (see, e.g., 'Bernstein & Jarvisll2002l: lRefregieill2003l : 
IRefregier & Bacoij |2003: Massev & Refregiej|2005l) it is the spa- 
tial variation in each shapelet coefficient that is described us- 
ing parametrized surfaces. Regardless of the quantity being mod- 
elled there is considerable freedom in the choice of the func- 
tional form of the fitting surfac e (see, however. [Rhodes et alj2007l : 
Ijarvis. Schechter & Jaiiill2008l for physically motivated PSF mod- 
els), and simple bivariate polynomials are typically used. Known 
problems with polynomial fitting surfaces, including reduced sta- 
bility at field edges and corners, have been noted but not neces- 
sarily tackled beyond su ggestions of other, perhaps better behaved, 
functional schemes (e.g. IVan Waerbeke. Mellier & Hoekstral200^ 
who used dense stellar fields to characterize structure in the PSF). 
Choices must also be made as to whether to model the PSF in- 
dependently in areas imaged by different CCD chips, and how to 
weight stars of different signal to noise in the same fit. Unfortu- 
nately there is often little guidance from the data itself, as accu- 
rately estimating the uncertainty on a stellar ellipticity measure- 
ment or shapelet coefficient is difficult. 

One recent development has been the suggestion of Principal 
Component Analysis (PCA) as a means of building up knowledge 
of the PSF ( Jai-vis & Jain 2004; Schrabback et al. 2009), allowing 
the use of more complex polynomial surfaces coupled with a more 
rigourous quantification of the degree of information redundancy. 
The technique uses a large number of images to explore the princi- 
ple vectors in the space of observed PSF models, and will clearly 
form a crucial part of future PSF modelling for large surveys. Using 
PCA, overfitting can be controlled whilst ensuring that all the ob- 
servable features in the PSF are properly modelled. However, this 
approach has one important caveat: it assumes that there is no inde- 
pendent random or complex quasi-random component to the PSF 
anisotropy in any given field. This assumption will conceivably be 
broken for ground-based data (possibly even from space), specifi- 
cally if the anisotropy is a combination of predictable and complex 
or chaotic effects. In this paper the investigation will instead fo- 
cus upon a complementary question: whether there are further tests 
of the modelling quality of a single PSF anisotropy map, includ- 
ing those created using PCA, without requiring it be drawn from a 
physically predictable underlying distribution. 

Tests for overall control of systematics, and indirectly there- 
fore the quality of the PSF model, do exist in weak lensing once 
the shear s ignal can be decompo s ed into E-mode/B-mo de compo- 
nents (e.g.. lCrittenden et alj2002l : ISchneider et alj2002h . However, 
these tests are only possible after the lensing shear is measured, at 
the end of the analysis, and B-modes may also be generated by 
intrinsic alignments and source clustering. An important investiga- 
tion into t he effe cts of imperfect PSF modelling was performed by 
iHoekstral diooj) . who analyzed residual correlations in PSF mod- 
els using the aperture-mass statistic and studied the impact of these 
correlations upon cosmic shear measurements. This paper naturally 
follows on from that work by providing a formal discussion of the 
reasons for residual correlations in poorly modelled data. In addi- 
tion, it presents a first investigation into whether such correlations 
may be used as an aid to the systematic selection of PSF modelling 
schemes, and as an aid to modelling in general. 

The assessment of goodness of fit (see, e.g., lLuptonlll993h 
of a given model to the physical data is a vital stage in any sci- 
entific analysis, and the related field of model selection is attract- 
ing increasing interest within astronomy as a means of evaluating 
evidence for competing cos mological models (for recent reviews 
in a strophysical contexts see lLiddle. Mukheriee & Parkinsonll2006l 
for recent discussions and applications see, e.g.. 



lLiddlell2007l : lEfstathioul [20o3 : iKurek & Szvdlowskill2008l) . Esti- 
mates and uncertainties upon model parameters derived from any 
fit are meaningless if the model itself is an unlikely match to the 
data, and if further analyses depend upon the accuracy or stability 
of this model then later conclu sions may be biased or subject to un- 
necessary additional variance jPaulin-Henriksson et alj|2009l) . The 
problem can become acute in applications where empirical or un- 
verified physical models must be used, or where errors upon mea- 
sured data points are difficult to estimate accurately. These prob- 
lems are precisely those encountered when attempting to model the 
spatial variation of the PSF in weak lensing applications. 



The most famous and often- used diagnosti c of goodness of 
fit is the chi-squared statistic (see lLuptonl I993h . and simple chi- 
squared per degree of freedo m arguments are often used as a means 
of model selection (see, e.g.. ISpergel et alj|2007h . A related mea- 
sure that generalizes to non-Gaussian distributions is the Akaike 
Information Criterion (AIC), derived from an approximate min- 
imization of the Kullback-Leib ler information entropy (see, e.g., 
iLid dle et al.ll2006l: lTrottj|2008l) . Bayesian probability theory (see 
Gel manetal.ll2003l " for a comprehensive general reference) also 
provides two further guides to model selection: the full calcu- 
lation of the Bayesian evidence, and its related approximatio n 
the Bayesian Information Criterion (BIC: again see lTrottj|2008h . 
These criteria all use calculations of the statistical likelihood £ of a 
dataset given the model in question, either the via integration of C 
over the full possible parameter space in the case of the Bayesian 
evidence, or by comparison of the best-fitting model maximum 
likelihood £max to the number of parameters in the model. In or- 
der to reliably calculate these 'data-given-model' likelihoods, the 
probability distributions p{yi) of individual measured data points 
yi must be known or well-approximated; as discussed above, this 
is seldom the case in PSF modelling contexts. 



m a strophysical I 
and lTrottj|2003 : 



One important topic of this paper is to discuss other proper- 
ties of the relationship between model and data that can be usefully 
explored without good prior knowledge of the uncertainties upon 
individual data points. A related investigation, merely initiated by 
this work, will be to begin to understand what information is be- 
ing lost when employing model selection arguments based entirely 
upon data-given-model likelihoods or related criteria. As will be 
shown, such information may be of use when diagnosing goodness 
of fit. The spatial correlation of residuals for both underfitting and 
overfitting models is partly predictable, and this insight can also be 
used to guide modelling improvements in a way that conforms to 
the principle of Occam's Razor. 

The structure of this paper is as follows: in Section[2]a basic 
theory of correlations in model residuals is described, specifically 
aimed at models of stellar ellipticity. This leads to the construction 
of two independent diagnostic functions, and makes predictions for 
the behaviour of these functions for under- and overfitting models. 
Section[3]then applies these diagnostics to a test-case scenario of a 
simulated starfield with a known underlying PSF anisotropy model. 
In Sections |4]&|5] these tests are repeated on a suite of simulated 
starfields with varying degrees of spatial structure in the PSF map. 
Relating the diagnostics to requirements for cosmic shear surveys is 
discussed in Section|6l In Section|7]the possibility of a generalized 
extension beyond weak lensing is discussed, followed in Section[8] 
by a general summary and conclusions. 
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2 SPATIAL CORRELATIONS IN MODEL RESIDUALS 
2.1 Basic theory and assumptions 

The starting point in this analysis is to construct a simplified de- 
scription of the process of modelling PSF anisotropy across the x-y 
plane, such as across a telescope field of view. In what follows the 
discussion is limited to models of complex, spin-2 pseudo-vector 
fields (i.e. f(x,y) = ?/)e'^*''^'''') on a 2D plane, but such ar- 

guments may be generalized to scalar or spin-n fields, and to other 
spatial dimensionalities (see Section[8l(. 

An observed complex ellipticity field is described as the sum 
of two contributions, the unknown 'true' ellipticity field et and A'^, a 
stochastic complex variable describing noise upon ellipticity mea- 
surements. This field is labelled e{x, y) — ei + ie2, where: 



■et+N. 



(1) 



In almost all that follows the explicit x-y dependence of fields such 
as et will be dropped for brevity, and should be implied. The de- 
scription of discrete observations by a continuous 'quasi-field' 
e is also a notational convenience: measured quantities will always 
be an e(xi,yi) sample of the proposed field e. We assume that the 
stochastic noise term satisfies (A^) = 0. 

If a best fit parametrized model is then applied to describe e, 
one can choose to write the modelled ellipticity em as 



et -I- m(et, iV; /m), 



(2) 



where m = e — et will hereafter be referred to as the inaccuracy 
in the model, also unknown, and /m is simply a label denoting the 
functional scheme used to fit the spatial variation of e (e.g. a sec- 
ond order bivariate polynomial). This expression makes explicit the 
dependence of the inaccuracy m upon et, the ensemble of discrete 
realizations of A'^, and /m. In general this dependence will be non- 
trivial to describe, but the aim of this paper is to look for diagnostic 
tests by which the functional properties of m can be constrained. 

The principal tool in this work is the two point ellipticity cor- 
relation function, the observable quantity used to extract signal in 
cosmi c shear studies. In this the investigation follow s Hoekstral 
(l2004h. as well as the later work of lVan Waerbeke et alj ^20051) and 
iHoekstra et alj j2006h . in analyzing correlations in corrected PSF 
patterns as a means of testing anisotropy models. Specifically the 
^± (r) correlation functions will be used, defined as 



i±(r) = (etan(a; + ■r)etan(a;)) ± (ex (a; + r)ex{x)) . 



(3) 



where etan and ex are the known as the tangential and rotated 
components of the ellipticity, and the angle brackets denote an av- 
eraging over all pairs of points separated by a distance r. It should 
be noted therefore that this definition implicitly averages over all 
angles, which is not a problem in the cosmic shear case where cor- 
relations are assumed isotropic so that ^(r) = ?(''); if the field is 
not strictly isotropic then it must be borne in mind that ^(r) is the 
angular average correlation, and thus that some information may 
have been lost. 

The tangential and rotated components are defined for the 
complex ellipticities e of each pair as 



etan + iex 



-2i0 



(ei + ie2) 



(4) 



(see, e.g., ISchneideill200^ . where <f> is the angle between the ab- 
scissa and the line joining the location of each member of the pair 



of points. With a discrete number of ellipticity measurements the 
quantities in equation ([3} can then be estimated by 



etaii(a; + r)etaii(a;) ± ex (a; + T')ex(a;)(5) 



in finite bins of r. Such correlation function estimates, when made 
using PSF-corrected galaxy ellipticities, are the primary observable 
quantity in modem studies of cosmolo gical weak lensing (see, e.g., 
IFu et al.ll2008l : ISchrabback et al.ll2007h . 

In the following discussion, which focuses upon S,+ as a di- 
agnostic of PSF modelling, frequent use will be made of a useful 
shorthand notation: 

(e*e) = ((etan - iex) (etan + iex)) (6) 

+ i (etan(a; + r)ex(a;) - ex(a; -I- r)etan(a;)) . (7) 

Due to parity symmetry (see lSchneidej2006[) the imaginary second 
term in equation Q tends to zero, and so (e*e) (r) — ^+ (r). Sim- 
ilarly, {e^Cm) (r) will be used to denote the ^+ (r) autocorrelation 
in the model em. This notation is convenient not only as a labelling 
convention but also as a computational tool, and so will be used 
exclusively hereafter. 

In order to proceed, two simplifying assumptions are made 
about the noise. Firstly, it is assumed that A' is spatially 
MHCorrelated so that 



{N*N) (r) = 



(8) 



for all r > 0. Secondly, it is assumed that the cross-correlation 
between TV and the unknown et is also zero, so that 



(et*Ar + Ar*et>(r) = 0. 



(9) 



Situations in which equations ^ and l|9) no longer hold can 
be envisaged, such as in the presence of problems with reduced 
Charge Transfer E fficiency (CTE) in telescope CCDs (see, e.g. 
[Rhodes et alj2007l) or preferences for certain PSF directions due to 
large pixel sizes or under sampled stellar images. In this theoretical 
analysis these factors are assumed to be small, but if necessary the 
assumption may be easily tested and corrected for using simulated 
data. 



2.2 Fit diagnostics 

For an ideal fit m = everywhere, but this is unrealistic in the 
case of noisy data and finite numbers of observations. For the case 
of an imperfect but well-constrained, stable and accurately predic- 
tive model the following three conditions should be simultaneously 
fulfilled: 



(m*et + e* m) (r) ~ 
{m*N + N*m){r) ~ 
{m*m) (r) ~ 



(10) 
(11) 
(12) 



for all r > 0. These conditions will be met if the modelling inac- 
curacies m can, like the noise, be approximately described as an 
independent, stochastic variable m = M with (Af) = 0. 

The first of these conditions l IlOl l requires that the inaccuracies 
m should be distributed at random with respect to the true elliptic- 
ity distribution, and thus that the cross-correlation between these 
quantities be zero; this condition will be broken if the model is sys- 
tematically underfitting the true ellipticities et, as will be discussed 
in Section 1231 below. The second condition jl It explicitly states 
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that m is required to be uncorrelated with the noise A*', expected if 
the scheme /m allows significant overfitting of the data. The third 
and final condition l ll2t requires that m show no spatial autocor- 
relation, which will be fulfilled if the inaccuracies at all points are 
mutually randomly distributed. This third condition may be broken 
for both overfitting, where neighbouring m values may be corre- 
lated to the extent allowed by instabilities in the chosen /m, and 
underfitting, where em systematically fails to reproduce observable 
features in et. 

Unfortunately, the functions in equations J10b-J12t cannot be 
measured directly, as m, A'^ and et are unknown. However, it is 
possible to construct two independent, observable correlation func- 
tions with which to attempt to constrain these quantities. There is 
freedom in how these are defined (as will be discussed below), but 
the following useful forms are suggested: 

Di(r) = ((e-e„0*(e-em)>(r) (13) 
D2{r) = (e*(e-enO + (e-enO*e)(r). (14) 

These two diagnostic quantities D\ and D2 may be easily esti- 
mated from modelled stellar fields using routines that are stan- 
dard in statistical lensing. Using the definitions and assumptions 
described by equations (|Tl(-([9](, they can be expressed in terms of 
the unknown quantities as follows: 

Di(r) = - {m* N + N* m) {r) + {m' m) {r) (15) 
D2(r) = ~ {m* N + N*m) [r) - {■m*et + elm) ir). (16) 

It is noted immediately that there are only two quantities with 
which to constrain the three unknowns of conditions (I10t-( ll2b . and 
the system is therefore underdetermined. This is a natural conse- 
quence of there being only two directly observable quantities (e 
and em) with which to place constraints upon correlations in m, 
and et. The system of equations described by dlSI l and ( |16l > can- 
not therefore be solved to demonstrate any of lll0t-(ll2t uniquely. 
Instead, one may determine only a general family of solutions. In 
vector notation this solution is simply 

{m*m)(r) \ / 1 \ / Z?i(r) \ 
{m*N + N*m){r) = 1 t+ (17) 

(m*et + et*m) (r) / \ -1 / \ -D2(r) / 

where t is any real number. Attempts to construct a third indepen- 
dent observable with which to break this degeneracy do not suc- 
ceed: for example, the observable function {e*e) — (emem) may be 
written simply as D2 — D\ . 

The fact that this system of equations is underdetermined 
means the D\{r) or D2(r) diagnostics can never be used to prove 
that {m*m) = {m*N + N*m) = (m*et+etm) = at all 
scales. However, they allow the positive diagnosis of poor mod- 
elling if either are nonzero at a significant level. Moreover, in order 
to be erroneously led into the belief that a given model fit is accu- 
rate and stable would require that 

(m*m) ~ {m*N + N*m) ~ - (m*et + e* m) (18) 

for all scales r, constituting a significant coincidence and very poor 
luck. Finally, although the general expression in l |17| ) allows an in- 
finite family of solutions to our three quantities l ll0ll2b . this solu- 
tion is not the only information we have. Further assumptions and 
simple reasoning may be employed so as to predict what might be 
expected for these diagnostic measures in the cases of over- or un- 
derfitting; this in turn may allow the tuning of modelling schemes 
so as to better represent the data without fitting noise. These rea- 
soned expectations for the general form of Di (r) and D2 (r) will 
now be outlined. 



e(x) 




Figure 1. Schematic showing how {rri'm) (r) may be expected to be both 
negative and positive, depending upon scale r, for underfitting models. The 
dotted line is a given 'true' model, and the thick solid line a best-fit straight 
line. The model inaccuracy m will be coirelated within the two cross- 
containing regions, but anticorrelated on the larger scales between the two 
regions, indicated by the closed arrows. 

2.3 The case of underfitting 

In order to describe what might be expected in the case of under- 
fitting, we consider a hypothetical scheme /m for which the model 
ellipticities em do not adequately represent the underlying field et . 
This can be expressed in terms of the simple model constructed in 
Section im and the further assumption that in the limit of severe 
underfitting it may be approximated that 

m = m{N, et; /m) ^ m(et; /m). (19) 

The validity of this approximation will depend strongly upon the 
degree to which a given fit fails to reproduce observable features in 
et, but it can be motivated by an illustrative example: a flat model 
scheme /m for which em ~ (e) — constant being used to de- 
scribe an et which varies significantly with x and y. In this case 
m = (e) — et, which is a clear function of et and only very weakly 
dependent upon the noise via (e). 

Given the assumption of equation J19b , it should be expected 
that m is approximately uncorrelated with the noise A'^ for under- 
fitting models (carrying only a weak dependence upon it), and thus 
that 

{m*N + N*m) (r) ~ {m*(et; .fm)N + N*m{et; U)) (20) 
~ 0. (21) 

The condition expressed in equation i ll U is therefore approxi- 
mately fulfilled for underfitting models, meaning that observations 
of Di (r) and D2 (r) will provide constraints upon the quantities in 
equations l llOb and l ll2t . Using equations (Tsj and ( I14l >, this then 
gives 

Di{r) ~ (m*m)(r) (22) 
D2{r) ~ - (m*et +etm) (r). (23) 

These predictions can now be used to make further conclusions as 
to the expected form of these functions, allowing the positive diag- 
nosis of underfitting models if these expected forms are observed. 
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From equation ( |19l > it is clear that the form taken by (rn'm) 
will be related to that of (e^et), and if the underfitting is signifi- 
cant then {m*m) is likely to be nonzero on or near the scales for 
which (e*et) is nonzero. Figure[T]shows a simple example of how 
both correlation and anticorrelation might be expected in underfit- 
ting model residuals; in this simple example of a one-dimensional 
underfitting model the residuals within the cross-containing regions 
will be positively correlated, but between these regions the residu- 
als will be negatively correlated. Turning to the simple em — (e) 
model for another illustration, using equations J22b and (A'^) ~ 
it may be written that Di(r) ~ (e^et) — |(et)|^. Therefore, this 
Di{r) may potentially be both positive or negative for differing 
ranges r, depending upon the functional form of et. 

This behaviour will also be expected from the second mea- 
surable quantity D2{r), which from equation J23b will also be ac- 
tivated by underfitting: (m*et + etm) will often be nonzero for 
m ~ m(et; /m). The function D2(r) may be rewritten by consid- 
ering that m = em — et, giving 

D2(r) ~ -(m*et + e*m) 

= 2 (e*et) - (emet + ej em) . (24) 

Once again, D2 ir) may be expected to be either positive or nega- 
tive depending upon scale r and the sign of {e*et). However, the 
precise dependence of D2 ir) may differ from that of D\ (r) in a 
way that is difficult to predict without detailed knowledge of the et 
field. 

2.4 The case of overfltting 

The case of overfitting is now considered so as to make predic- 
tions for the behaviour of Di (r ) and D2 (r ) : if the behaviour differs 
from the underfitting case this will allow the two cases to be distin- 
guished and positively identified, and allow subsequent modelling 
improvements to be made in a guided fashion. As mentioned in 
Secti onfT] the application of PCA to PSF modelling jjarvis & JainI 
l2004h also brings control over overfitting via the removal of identi- 
fied low-importance principal components. However, it may be that 
a hybrid of PCA and the simultaneous fitting of an independent sur- 
face is necessary to account for random changes in the PSF pattern 
between exposures, and so a diagnosis of possible overfitting in 
this extra surface will still be desirable. Moreover, the behaviour of 
D\ (r) and D2 (r) in overfitting models is an interesting investiga- 
tion in itself, as the technique may be useful in other situations in 
which PCA is not directly applicable (see Section]?}. 

In Section l231 the inaccuracy m was approximated as being a 
function of et and /m only. A similar approximation can be argued 
for the opposing case of severe overfitting: 

m = m{N, et; /m) m{N; /m). (25) 

This statement can be justified by considering what is meant by 
an overfitting model: one which captures not only an observable 
physical trend, but is also unjustifiably sensitive to random noise 
upon measurements. Such models will not in general be biased 
in a way that can be related to et, but will prove to be unsta- 
bl e with respect to changes in (this concept is also discussed 
in lPaulin-Henriksson et al.ll2009h . The correlation properties of m 
are now largely decided by this random quantity and inherent cor- 
relations caused by the best-fitting chosen model /m. Taking this 
assumption, and using equations M5\ and ([9}, leads to 

{m*et + e* m) (r) ~ {m*(N; /m)et + elm{N; /m)) (26) 
~ 0, (27) 
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via the same reasoning as equation MQ\ . This corresponds to saying 
that condition dlOl l will be fulfilled when severely overfitting data, 
just as in Section [23l it was argued that condition Jilt is automati- 
cally fulfilled if data is being underfit. 

Using this result, the effects of overfitting upon the forms of 
D\ (r) and D2 {r) can be predicted. From equations ( |13b , and 
( 127b it may be written that 

Dx(r) = {m*rn){r)-{m*N + N*m){r) (28) 
D2{r) ~ ~{m*N + N*m){r). (29) 

It is noted immediately that the expression for Di (r) remains un- 
simplified: unlike (m*et + etm), the {m*m} term will not nec- 
essarily vanish for overfitting models, despite the assumption of 
uncorrelated noise in equation {8}. As can be seen from equation 
i25l . there remains the possibility of correlation due to the inherent 
properties of the functional form of the model /m used to unstably 
fit the data. 

It is expected that the condition il2\ is not fulfilled when we 
are overfitting the data, and thus that D2{r) will be nonzero. It 
can furthermore be said that a positive cross-correlation will be ex- 
pected between A'^ and m, so that 

{m*N + N*m) > 0. (30) 

This can be justified by considering once again m = em — et, 
which combined with equation (|9} then gives 

{m*N + N*m) (r) = (e;;,iV + iV*em) (r) > 0. (31) 

This last inequality indeed expresses a definition of overfitting, be- 
ing that the best-fitting model em shows some significant average 
correlation with the particular realization of the noise A'^. It is then 
clear from equations J29t and ( |31t that 

Diir) ^ -{m'N + N'm) (r) <0, (32) 

i.e. nowhere positive and potentially significantly negative, for 
overfitting models. In Section l231 it was seen that D2 (r) is expected 
to be positive over some range of r for severely underfitting mod- 
els. Whilst the minimization of D2{r) will lead to an optimal fit in 
any case, this difference in behaviour between under- and overfit- 
ting offers hope of diagnosing successfully between the two cases, 
allowing informed improvements in modelling to be made at each 
stage. 

Furthermore, it can be argued that in most cases the (m*m) 
due to undue freedom in the model /m will be small compared to 
{m*N + N*m); if this is so the Di (r) function may also be used 
to distinguish under- and overfitting. The argument relies on the 
following insight: an overfitting model, but one that has neverthe- 
less been constructed by minimizing deviations from the observed 
data, will in all but the most pathological cases be expected to have 
inaccuracies \m{N; /m)| < lA'il at each point {xi,yi). The lim- 
iting example of this behaviour is an 'ultimate overfit', for which 
the model is simply e^n ~ e (and thus m = A'), with an interpo- 
lation or spline between stellar data points. Such a model takes no 
account of the fact that there may be noisy or imperfect measure- 
ments among the input data. Using \m{N; /m)l < \N\ then gives 

{m*m) < ^{m*N + N*m) < {m*N + N*m) (33) 
which in turn implies that 

Di{r) = (rn'm) (r) - {m*N + N'm) (r) < (34) 
in the same way as D2{r). This offers further hope of positive. 
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distinguishable diagnoses of both under- and overfitting, based on 
whether -Di(r) and D^ir) are observed to be both positive and 
negative, or negative only, respectively. 

To summarize, the functional behaviour of the Di(r) and 
D2{r) diagnostic functions for both over- and underfitting mod- 
elling schemes /m has been predicted using simple assumptions 
about the nature of the modelling, the noise, and the data them- 
selves. The most important, and potentially least secure, of these 
assumptions are the validity of the approximate functional depen- 
dencies of m, given in equations J19l > and i25\ . In order to test these 
points of reasoning, and to explore the strength of each diagnostic 
function in a realistic modelling situation, the following Section 
will test the efficacy of Di{r) and D2(r) using a simulated PSF 
anisotropy map with a known, underlying ellipticity model. 



3 TESTS ON A SIMULATED ANISOTROPY MAP 

The diagnostic functions Di{r) and D2(r) have been shown to 
offer some promise as diagnostics of both over- and underfitting 
models. The discussion has, from the start, focused upon the mod- 
elling of ellipticity fields across a plane, a process very commonly 
undertaken in the modellin g of anisotr opic PSF s for weak lens- 
ing (e.g., [ Kaiser et al.l ll995l ; THoekstra et al.ll 19981 ; iLeauthaud et all 
|2007| : IFu et al.i.2008f) . Improving this modelling is the primary mo- 
tivation of this work, and so in order to test Di (r) and D2 (r) it 
is appropriate to simulate modelling conditions similar to those en- 
countered in such weak lensing analyses, exploring whether some 
of the simple insights of [2] hold validity in such a regime. In this 
Section the discussion will concentrate upon a single simulated PSF 
anisotropy field (hereafter referred to as simply the starfield) in or- 
der to illustrate the forms of Di (r) and D2 (r) in some detail, and 
give visual examples. 



3.1 Constructing the starfield 

The simulated starfield is designed to mimic measurements of 
ellipticity from 2500 stars across a square telescope field of 
view of area 1 deg^. The number of stars in the field, the field 
shape, typical PSF anisotropy and measurement noise are loosely 
based upon lensing observations from the Canada -France-Hawaii 
Telescope Legacy Su rvey- Wide (CFHTLS-W, e.g. IFu et al.|[20o3 : 
iHoekstra et ■^l2006h . An ellipticity measurement e was assigned 
to each star as modelled in equation ([TJ, consisting of a true un- 
derlying model et and additive noise term TV, each scaled so as to 
resemble the CFHTLS-W. 

In order to construct et (x, y) the telescope field of view is first 
defined upon a set of coordinates x' , y', with both x' and y' varying 
in the interval [-1,1]. Each component of the true field et was then 
modelled by a bivariate, fifth order Chebyshev polynomial defined 
as 

j + k<5 

(et), = <^^jkTj{x')my') (35) 

i,fc=0 

with i = 1, 2 denoting the real and imaginary parts of et respec- 
tively, and where Tj (x) is the j th order Chebyshev po lynomial of 
the first kind (see Figure[2j also lArfken & Webeill2005l) . From this 
Figure the reason for choosing these Chebyshev polynomials be- 
comes clear: each term in equation i35l will add contributions of 
similar magnitude across the interval [-1,1]. The value of each Uijk 
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Figure 2. Lowest order Chebyshev polynomials of the first kind, plotted 
in the interval x G [-1,1]. Upper panel: T\{x) (solid line), T'i{x) (dotted 
line), T^(x) (dashed line). Lower panel: T2(x) (solid line), T^(x) (dotted 
line), Te ix) (dashed line). The normalization properties of these functions 
on the chosen interval make them well-suited for simulating et fields with 
approximately equal degrees of spatial structure on varying scales. 

was then assigned by random sampling from a Gaussian distribu- 
tion of mean /la = and standard deviation da = 0.01. The co- 
ordinates are then transformed from x\ y' to an 'observed' x, y (in 
arcminutes) via 

' I 2y 

"^60"^' ^^60"^' ^^^^ 
leading to a functional expression et(x,y) across the 60 x 60 
arcmin^ simulated field of view. 

The random noise TV added to each component of et was 
then modelled as a Gaussian random variable with mean fiN = 
and standard deviation (tat — 0.015, matching scatter in ellipticity 
measurements for the bright stars (typically with i-band magnitude 
i < 22) used in PSF modelling in the CFHTLS-W. The resulting 
simulated ellipticities for this test field can be seen in Figure [3] 
along with the (e*e) correlation function for the starfield. 

It should be noted that whilst the amplitude and noise prop- 
erties of this starfield have been designed to loosely approximate 
real stellar ellipticity data, no attempt has been made to ensure that 
the spatial variation in the underlying et resembles coherent phys- 
ical patterns as caused by tel escope focusing er rors, coma or other 
optical phenomena (see, e.g.. |jarvis et al.ll2008[) . This is essentially 
a separate, although important, issue: in the following investiga- 
tion the aim is merely to test whether Di{r) and D2{r) help in 
the correct identification of an appropriate fitting scheme fm for 
the simulated starfield, i.e. whether the most successful fit is based 
upon a fifth order bivariate polynomial, matching the level of input 
spatial structure. 

Nonetheless, the analysis still has practical relevance as it 
must be hoped that typical PSF patterns mostly fall within the space 
of possible starfields in the random prescription described above 
(otherwise polynomial fitting itself is likely to fail). In Sections|4]& 
[5]the testing will be extended to large sample of random starfields, 
precisely in order to explore the success of Di (r) and D2{r) for a 
variety of input 'true' signals. The remainder of this Section will be 
concerned with the success of these new diagnostics for the starfield 
of Figure [3] 
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Figure 3. Upper panel (a): Whisker plot showing the positions and ellip- 
ticities of the 2500 stars in the simulated 1 deg^ starfield, as described in 
Section im The simulated noise on each component of 'measured' elliptic- 
ity e is (Tjv = 0.015. The dashed lines show the boundaries of the 8 chips 
defined when performing an artificial 'chipwise' fit to the simulated data. 
Lower panel (b): Autocon'elation (e*e) in the same simulated starfield. 



3.2 Fits to the starfield ellipticities 

The ellipticity variation in the simulated starfield was then least- 
squares fit using each of a set of four different bivariate polynomial 
fitting schemes fm'- third, fourth, fifth (matching the input polyno- 
mial order) and sixth order simple polynomials, performing a fit 
to each component of ellipticity independently. The use of simple 
rather than Chebyshev polynomials for fitting is immaterial, as each 
can be formally expressed in terms of the other (at equivalent or- 
der) via exact linear transformations in the polynomial coefficients. 
The best-fitting model was found in each case using Singular Value 
Decomposition (SVD) as implemented by the IDL routine SVD- 
FIT. PRO, itself based upon the SVDCMP.F routine within Numerical 
Recipes dPress et al.lll992h . The SVD fitting process has the prop- 
erty of minimizing numerical round-off error and matrix singular- 
ity problems when attempting to solve underdetermined systems of 



equations, and thus even in the case of an overfitting fm will pro- 
duce the best possible solution. This is important as the diagnostic 
tests presented should be as insensitive as possible to numerical 
issues. 

So as to provide a clear example of overfitting, the starfield 
was artificially split into eight regions associated with hypothetical 
CCD chips. These 'chip regions' can be seen in Figure[3] bordered 
by dashed lines. This is done to illustrate the increased potential 
for overfitting when mo delling 'chipwi se', as is commonly done in 
weak lensing (see, e.g, IFu et alJIlOOSl who independently model 
the PSF anisotropy in each of the 36 CCD chips in the 1 deg^ 
CFHTLS-W field of view). However, such work only uses low- 
order polynomials for each chip, and it is also interesting to observe 
the behaviour of Di (r) and D2 (r) in cases of severe (unrealistic) 
overfitting. Fitting chipwise we also begin to explore the question 
of whether modelling schemes that cannot perfectly fit et, i.e. a 
'wrong' model, may be validated as practically sufficient given lim- 
ited data. For this chipwise fit we adopt schemes fm that use first, 
second, third and sixth order bivariate polynomial surfaces, each 
being fit to each chip independently. 



3.3 Comparison with simple modelling diagnostics 

Having fit the 2500 simulated starfield measurements of e using a 
set of models em (a;; fm), for a variety of different fitting schemes 
fm, the diagnostic functions D\{r) and D2{r) are calculated us- 
ing the formula in equation ([5). Measurements are divided into 
25 logarithmically-spaced angular bins between 7 arcsec and 1.4 
deg. This binning scheme was chosen as fairly representing both 
small and large scale information. By varying these values it was 
also verified that Di (r) and D2 (r) were stable in regards to this 
choice, which was found to be true once sufficient numbers of bins 
were used as to be able to explore small scale correlations. Uncer- 
tainties were then calculated as the standard error upon the mean 
value from all the pairs within each bin, therefore not taking the 
correlation in values between neighbouring bins into account. Cal- 
culations of the Di (r) and D2 {r) diagnostics, for each scheme /,„, 
can be seen in the right hand panels of Figure |4] for the global fit 
and Figure[5]for the chipwise fit. 

Whilst it should be noted that residual-residual correlations 
have already been us ed by some groups to rule out or justify 
PSF models (see, e.g ., 'Hoekstra"2004'; Va n Waerbeke et alJi200a 
iHoekstra et"ai]|2006l : Schrabback et al. 200I), and that these will 
be compared to Di (r) and D2{r) in the following Section, the left 
hand panels of Figures |4]&|5] show comparative examples of more 
simple diagnostics us ed in the past to display the results of PSF 
model fits (see, e.g., iHoekstra et al] 1 199a iHevmans et~aLl l2005l: 
ISchrabback et al.ll200'7ir The far left hand panels of Figures |4] & 
[5] show 'whisker plots' of e — em (referred to as corrected ellip- 
ticities) to depict the random nature of fitting residuals. As can 
be seen by comparison with the corresponding Di (r) plots, cor- 
relation and anticorrelation may exist that is difficult to accurately 
quantify by eye, although qualitatively there are traces of correla- 
tion in the third and fourth order fit whisker plots. A correlation 
analysis of some sort is nonetheless cle arly desirable, just as has 
been argued previously ( lHoekstrall2004h . The inner left hand pan- 
els depict the distribution of 'original' ellipticities e in comparison 
to the distribution of the corrected ellipticities e — em; once again 
these plots are more difficult to quantifiably interpret than Di (r) 
and D2{r), which show more markedly different behaviour with 
fm- The question remains as to whether these new diagnostics are 
behaving as predicted in Section|2] 
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Figure 4. Traditional weak lensing tools for displaying the starfield PSF model fit results (left panels) and the corresponding Di(r) and D2{r) diagnostic 
correlation functions (right panels). As described in Section lXT] the input PSF anisotropy et was a fifth order bivariate Chebyshev polynomial surface in each 
component of ellipticity. 



Examining first the global fit results shown in Figure |4] it can 
be seen that both Di (r) and D2 (r) show clear residual positive and 
negative correlations for the underfitting models, as predicted. Both 
diagnostics appear to be broadly consistent with zero for the fifth 
and sixth order fits. For the chipwise fits of Figure[5]the results are 
similar but show an interesting extra feature. While both diagnos- 
tics rule out a first order chipwise fit as clearly underfitting, and the 
second order more marginally, the third order chipwise fit shows 
slight evidence of anticorrelation on small scales. This is then seen 
more clearly when the fit is taken to sixth order chipwise, particu- 
larly for the Di{r) diagnostic. These results suggest an overfit to 
the data for third and sixth order chipwise fits, at least according the 
reasoning presented in Section |2!4l this is a reasonable conclusion 
given knowledge of the number of degrees of freedom in the initial 
model, and could not have been so easily diagnosed using current 
methods. 

Furthermore, the fact that neither the second nor third order 
chipwise fits show perfect consistency with zero suggests that none 
of the schemes chosen in this artificial chipwise splitting of the 



field is best suited to modelling the data, also a reasonable con- 
clusion. However, it may be that this apparent inconsistency is in- 
stead caused by chance and the fact that (m*m) (r) is no longer 
isotropic, since the artificial chips are rectangular. Nonetheless, 
even the possibility that Di{r) and D2{r) might allow general 
modelling schemes to be iteratively improved by correcting flaws 
such as the wrong choice of fitting function family, or the unneces- 
sary splitting into independent chips, is of practical interest when 
fitting to an et of unknown functional form. Fitting to an arbitrary 
underlying field using a 'wrong' (or, more accurately, incomplete) 
basis will be explored further in Section |5] in which polynomial 
fits will be made to randomly generated fields with only an average 
power spectrum specified. 

In summary, for the simple example presented in Figure |4] it 
appears that the degree of agreement with Di{r) — D2{r) — 
is a potentially useful aid to model selection when compared with 
simple diagnostic tools that have been used in the past. In Figure |5] 
both diagnostics help rule out models that would clearly be under- 
or overfitting, but there is no scheme that performs perfectly once 
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Figure 5. Traditional weak lensing tools for displaying the starfield PSF model fit results (left panels) and the corresponding Di{r) and D2{r) diagnostic 
correlation functions (right panels). These results are for the independent chipwise fits to each of the eight chip regions in the starfield. As described in Section 
13.11 the input PSF anisotropy et was a fifth order bivariate Chebyshev polynomial surface in each component of ellipticity. 



the field is artificially split into chips. This fact gives hope that cor- 
relation analyses of this sort might guide, in a directed manner, 
iterative improvements to modelling where there is no clear phys- 
ical motivation for selecting a given scheme (in the case presented 
it might motivate the decision not to model chipwise, for example). 
It is now instructive to compare the results of this analysis with the 
more complex diagnostics of PSF modelling that have been dis- 
cussed in the literature, which are also based upon correlations in 
residuals, to see what may be added by the approach presented here. 



3.4 Comparison with the aperture mass dispersion and other 
related correlation diagnostics 

The use of residual correlation diagnostics in a simila r form to 
- Di(r) and D2(r) is not new, having been advocated by iHoekstral 
( I2OO4I) in the form of the ape rture mass dispersion estimator 
Mjr) defined in equation (16) of[Schneider et alj f2003) (see als o 
ICrittenden et"^l20()2l: ISchneider. Van Waerbeke & Melliej|2002h . 



a filtered combination of and (,-{r) that gives an unbiased 

estimator of the cosmological aperture mass dispersion: 



r'dr' 



(Ml) (r) 



and 

(Mi) (r) = - ' 



(37) 



,(38) 



where t he functions T±(x) are non-zero only for x < 2 and are 
given in lSchneider. Van Waerbeke & Mellieil Jiool) . The aperture 
mass dispersion has the useful property that it allows a decompo- 
sition into 'E' and 'B' mode contributions (which are (A/ap) (r) 
and (r) respectively) to the correlation signal, the former 

of which only is produced by a simple scalar mass potential (al- 
though B-modes can be created by contamination to the shear cor- 
rection, intrinsic alignments of source galaxie s etc., se e^Schneidej 
12006). This measure was then employed by IVan Wa erbeke et al] 
I2OO5I) and lHoekstra et al.l ( l2006l) to find optimal schemes for mod- 
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Figure 6. Aperture mass dispersion statistic for the residuals of the fits in 
Sections 13.21 & l33l corresponding to the diagnostics plotted in Figures |4] 
&|5] The crosses resembling addition signs represent the E mode signal 
and those resembling multiplication signs the B mode signal, which are 
offset slightly along the abscissa for claiity. Left hand side panels are for 
the simple polynomial fits to the whole field, whereas the light hand side 
panels show the dispersion in the chipwise fit residuals. 



elling the spatial variation of the PSF anisotropy, aiming towards 
^A{r) = Q for the residuals between modelled and measured PSF 
anisotropies. In both studies the chosen PSF fitting scheme was that 
which minimized the aperture mass dispersion of its residuals. 

In Figure |6] the aperture mass dispersion is plotted for the 
residuals of the fits described in Sections 13.21 & [331 correspond- 



ing to the diagnostics plotted in Figures |4] & [S] The results are 
similar to those of D-i{r) and D2{r), in that there are clear cor- 
relations for the underfitting cases. The signal to noise is reduced, a 
known feature of this statistic (e.g., Hoekslxa et al. 2006; Schneideil 
l2006l; l Fu et al.ll2008h . and a means of positively identifying over- 
fitting versus underfitting is less clear. This is because the aper- 
ture mass filtering of ^+ (r) and ^_ (r), which provides a desirably 
clean statistic when considering lensing due to physical mass distri- 
butions, makes the interpretation of correlated modelling residuals 
(via simple arguments such as those presented in Section [2J more 
complicated. One can say that correlation or anticorrelation exists, 
but saying which and why is less clear. 

There is a suggestion in the bottom right hand side panel of 
Figure|6] for the sixth order chipwise fit, that the negative values of 
(Map)^^^ are an indication of the overfilling model in the same 
way as the negative values of Di{r) and D2{r) at small scales; al- 
though with apparently less overall signal than Di(r). Justifying 
this hypothesis using extensions to the arguments of Section|2]may 
be possible, and would immediately i dentify a clearly over fitting 
PSF model in the study performed by iHoekstra et al.l feOOeh (Fig- 
ure 7 of that work): a not impossible finding given their indepen- 
dent fitting of second order polynomials to each of 36 chips across 
the CFHTLS Megacam field. Whether this sheds more insight than 
D\ (r) and D2{r) upon the PSF model quality is unclear, however, 
and even unlikely given the lower signal to noise of the aperture 
mass measures (although see lFu & Kilbingerll2009t) . 

Nevertheless, as (Map) (or the more rec ently- d erived 

ring statistics (TZTZ) „ i „, see [Schneider & Kilb ingeil |2007|; 
lEifler, Schneider & Kraus3l2009l ; IFu& Kilbingen |200 9') provides 
a model-independent method for E/B mode decomposition, it is 
in many ways a preferred choice for constraining cosmological 
parameters. Therefore, it will undoubtedly be of use to express 
residual correlations in the PSF model in terms of such quantities 
to quantify the contamination to the measured shear signal (e.g., 
[lloekstra 2004; see also Section |6). But in the diagnosis of poor 
modelling, and in the directed manner in which simplifications or 
increases in complexity to the PSF model can be motivated, the in- 
terpretational clarity of Di{r) and D2{r) is an advance upon the 
use of the aperture mass dispe rsion. 

ISchrabback et al. I l l2007 !) also present a residual correlation 
analysis as a diagnostic of PSF modelling. These authors demon- 
strate that their chosen PSF model minimized the two functions 
\^+{r) +^_(r)| and — ^-{r) \ when measured upon model- 

data stellar ellipticity residuals, after randomly drawing stars from 
dense stellar fields (estimating the impact upon cosmic shear by 
scaling to an equivalent shear correctio n using randomly-draw n 
values from a galaxy population; see I Schrabback etal]|2007h . 
Overfitting was not a concern for these authors, who built a family 
of robust and detailed HST PSF models using dense stellar fields 
and then selected from this family via a maximum-likelihood match 
to the far fewer stars available in the galaxy survey images. As such, 
no attempt was made to interpret the sign of the resulting functions 
and the absolute values alone sufficed as a diagnostic. Again, the 
interpretability of Di{r) and D2{r) in terms of over- and under- 
fitting, via the arguments in Section |2] is of significant additional 
value. 



3.5 Quantifying the agreement with Di (r) = D2{r) — 

These results are interesting, but in order to test their repeatability it 
will be necessary to find some method of compressing the data and 
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quantifying thie qualitative visual appraisal so far conducted. Ide- 
ally there would be a means of ascribing a single, easily-calculated 
number to the Di{r) and D2{r) results, describing how well a 
given model matches the desired Di(r) = behaviour. An ob- 
vious example is a chi-squared-like measure, but a true chi-squared 
is impossible without knowledge of the covariance between bins 
of Diir), and these covariance matrices may in general only be 
estimated post hoc (often imperfectly) via a statistical jackknife 
or bootstrap. The calculation of even a single D\(r) or D2{r) 
takes some short time, varying with the square of the number of 
data points, and so these processor-intensive bootstrap techniques 
quickly become prohibitively expensive. In this Section we exam- 
ine practical possibilities for a cheaper alternative to a full chi- 
squared measure of the Di{r) — hypothesis. 

To assign simple numbers to these results, in effect searching 
for a better proxy to the 'appraisal-by-eye' performed in the previ- 
ous Section, the following quantities are defined: 

(rfO. - (39) 

= ]^E('^>)^' (40) 

j 

(^') - (41) 

J V bins 

J 

Here i = 1, 2 so as to specify the Di{r) or D2{r) diagnostic re- 
spectively, j denotes each discrete bin of angular scale rj , and the 
error estimate on each Di{rj) is denoted as (t_d. (r^). As discussed 
above, it should be noted that the values for neighbouring bins of 
Di{r) are correlated and the uncertainties ctu. (rj) do not take this 
into account: it would thus be dangerous to associate equation ( I41t 
with any sort of true chi-squared measure of the goodness of fit to 
a desired Di = scenario. 

However, use may be made of the fact that the correlation be- 
tween bins for the diagnostics must be expected to be minimized 
for those successful models that approach Di — 0: this can be sim- 
ply argued by considering that m is expected to become approxi- 
mately random in these cases. Also of use is the fact that the most 
important task at hand is merely to select a best-fitting model using 
a single number that quantifies this success. It might still be hoped 
that ( 1401 ) and i4ll are useful as a way of ranking competing models. 
The overall normalization of (di) and (d^) for failing cases would, 
when ranking, be less important than the relative normalization as 
compared to successful cases. 

Another means of quantifying the agreement with Di(r) — 
is via the the Kolmo gorov-Smimov (KS) test (see, e.g.. |Press et al.l 
ll992l : lLuptonlll993l) . suggested to the grateful author by the anony- 
mous referee. In the null hypothesis of a well-fitting model we may 
approximate that the individual values of are each indepen- 
dently described by a Gaussian distribution with zero mean and 
tmit variance. The maximal difference between the empirical cu- 
mulative distribution function (derived from the data) and the cu- 
mulative distribution function of the null hypothesis can then be 
used to assess the goodness of fit. The probability PKs(di) that the 
null hypothesis could produce at least the maximal difference seen 
may then be calculated from a series approximation to the Kol- 
mogorov distribution (although note that the use of this distribution 
is not strictly accurate, given that the 'data' {di)j we use are de- 
rived quantities, and so this meas ure should be rig htly interpreted 
only as an approximate guide; see lPress et al.lll993) . 

The values of each of the (di), (dj) and PKs(di) statistics, 
for each fit to the starfield, are given in Figures |4]&|5] It would be 



naively expected that the best fitting fm would show minimum val- 
ues of (di ) and | (di) |, and a maximum PKs(di): this is seen in the 
case of Di{r), but the D2{r) results marginally favour the over- 
fitting sixth order polynomial in this case. All statistics strongly 
rule out lower order fits. For the chipwise fits the situation is more 
complex, due to their being no fitting model which shows clear 
consistency with Di{r) = 0. Second and third order fits are vari- 
ously preferred. As discussed in Section [33] this is perhaps due to 
there being no /m in this artificially split case that can reproduce all 
features of the ellipticity field without some overfitting redundancy. 

All three statistics show some promise in being able to ap- 
proximately describe the extent to which Di (r) = 0. More sophis- 
ticated approaches could certainly be explored, particularly given 
the information regarding the expected signatures of under- and 
overfitting described in Sections [2. 3 1 & |2^ This extra information 
might be perhaps be used within a Bayesian framework to select 
models, particularly if the modeller wished to rule out either an 
under- or overfitting model at any cost. This may be fertile ground 
for future work, as it is clear that the three statistics (df ), | (di) |, 
and Pks (di) were chosen above all for their immediate simplicity 
and clarity. 

These results lead to the possibility of asking another question: 
can model selection via the statistics described here be successfully 
repeated for a variety of input et fields? A single example field is 
hardly strong evidence for the utility of Di{r) and D2(r) in more 
general cases, so many more fields must be tested. The broad suc- 
cess of the three simplifying statistics described in this Section of- 
fers hope that the author might be spared from the need to visually 
inspect many plots of Di{r) and D2{r), and avoid the inevitable 
subjectivity in such an approach. Moreover, the necessity and diffi- 
culty of performing a full and stable statistical jackknife calculation 
(in order to estimate covariances) may be avoided. The following 
Section tests the success of (d?), I (d.) I and PKs(di) as cntena 
for correctly identifying appropriate modelling order using a large 
number of randomly generated et anisotropy fields. 



4 MONTE CARLO TESTING USING A SUITE OF 
SIMULATED ANISOTROPY MAPS 

The repeatability the results of Section[3] which hinted at the utility 
of _Di = as a model selection criterion, will now be tested using 
a large number of simulated e fields. To test the model selection for 
input physical signals of varying spatial complexity, each compo- 
nent of the input et is modelled as an nth order bivariate Cheby- 
shev polynomial surface as described by equation l |35l l, where the 
upper limit of the double sum is now j + k < n = 1,2, ... ,7. 
For each n a suite of A^mc = 500 randomly generated starfields is 
then created using a procedure directly analogous to that described 
in Section im Then, for each of the 500 simulated starfields of a 
given input order n, a set of best-fitting models for the anisotropy 
map are constructed as described in Section [T2l this time, how- 
ever, a greater range of bivariate polynomial fitting schemes fm are 
explored, ranging in fitting order from m = 1, . . . , 7. 

The same process is also performed using the artificial chip- 
wise division of the field of view when fitting. Finally, the resulting 
D\ (r) and D2 {r) statistics were calculated as described in Section 
l3.3l for each input order n and fitting order m, for each of the 500 
random starfields, and the (d^)^.^, | (di) |niin and P^^{di) crite- 
ria discussed above were then used to select the most appropriate 
modelling order. 
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Figure 7. Number of model schemes selected as best fitting using the 

(^l)min ^^PP^^ (^Dmin ^^PP^^ ^^^ht), |(f^l>|min (mid left), 

I {d2 ) Imin (mid right), P^^^ (di ) (lower left) and P^^''{d2 ) (lower right) 
criteria, as a function of input polynomial order n. The total number of fields 
simulated at each input order was A''mc = 500. Figures 2]&[8] were cre- 
ated using a modified version of the SHAPELET S_PLOTJMAGE.PRO routin e 
from the publicly-available Shapelets software jMassev & Refregiej|2005l) . 



4.1 Results 

The number of times a global fitting scheme of order m was cho- 
sen as best representing the simulated starfield of input order n 
is shown in Figure |7] quantifying the success of the (d?)^;^. 
I (rfi) |min and P^l^'^idi) criteria for this simple Monte Carlo test. 
The same results are shown for the chipwise fits in Figure[8] 

For the range of input models and statistics tested it appears 
that Di (r) is typically a cleaner and more reliable model selection 
diagnostic than D2{r), which in both the global and chipwise fits 
is noisier and appears to show a biased relative preference for over- 
fitting models in most cases, (di)^^^.^^ and | (di) |min show striking 
success in selecting the correct fitting order in Figure |7] but not so 
their D2{r) counterparts. The performance of P^l^^idi) is more 
disappointing overall and seems insensitive to identifying overfits 
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Figure 8. Number of chipwise model schemes selected as best fitting us- 
ing the (df)^.^ (upper left), (d^)^-^ (upper right), | (di) l^in (mid left), 
I (^2 > I min (mid right), P^^" (di ) (lower left) and P^''" {^2 ) (lower right) 
criteria, as a function of input polynomial order n. The total number of fields 
simulated at each input order was A^mc = 500. 



when fitting globally, although in Figure [8] there is evidence of it 
correctly ruling out the more extreme chipwise overfits, at least for 
Di{r). I (di) |min appears to show moderate preference for lower 
order fits than (df particularly at lower orders, perhaps due to 
the close cancelling of correlation and anticorrelation on different 
scales (c.f. Section IZJl l. Nonetheless, all statistics appear to rule 
out underfitting models successfully. 

The chipwise results show better rejection of overfitting mod- 
els, and therefore greater overall agreement between the results 
for different statistics. This is not surprising when one considers 
that for chipwise fits the number of degrees of freedom in the fit- 
ting model increases more rapidly with increasing order m than in 
the global case. The comparison of the chipwise and global results 
therefore suggests that, for those statistics which failed to rule out 
overfits, the problem was more of relative sensitivity than a total 
failure. The P^^{di) statistic performs the worst, and lPress et al.l 
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( Il992h suggest a possible reason for this: the sensitivity of the KS 
test of deviations from the c.d.f. (di) is not independent of di, and 
is in fact most sensitive around the median value. This makes the 
test least sensitive in the wings of the probability distribution, and 
hence to the outliers which may give the cl earest indication a poor 
fit. This is one plausible explanation, and IPress et ^ il992h de- 
scribe possible alternatives to the KS test that attempt to mitigate 
this problem. Another reason might be the incorrect use of the Kol- 
mogorov distribution in the test, since {di)j are quantities derived 
fro m the data rather than direct data themselves (see Section [3751 
also |Pressetal]|l992h . 

These results leave open an avenue for potentially valuable 
further study, as an optimal way for identifying Di (r) — D2 (r) = 
is clearly not yet found. Despite this fact, the results of Figures 
|7]&[8]are extremely encouraging, for (^1)^^^;^^ and I (di) |min in 
particular. The results D2{r) were generally less successful, and 
seemed to show an insensitivity to overfitting in particular. Possible 
reasons for this are now briefly discussed. 



4.2 Understanding the relative failure of D2 {r) 

In the Monte Carlo tests performed it appears that the selection cri- 
teria based upon D2{r) show some shortcomings that need to be 
explored. Firstly, they often prefer overfitting models when com- 
pared to the D\ (r) results (biased when compared to the truth for 
Figure It}. Secondly, both (di) and [ (d2) | are noisy. This latter 
can perhaps be partially explained by the fact that D2{r) is itself 
often more noisy than D\{r), a natural consequence of the value 
and variation of e being typically larger than that of e — em (see 
equations 1131 and [T4t : the biased results warrant further investiga- 
tion. 

Complete answers as to why D2{r) performed relatively 
poorly in this simple experiment almost certainly lie outside the 
capability of this paper, as this behaviour may depend non-trivially 
upon many factors. Possible contributions could be: the number of 
stars simulated per field; the overall signal to noise for the e field; 
the geometry of the chosen field of view; the nature of the typi- 
cal pattern described by the simulated et (the chosen polynomials, 
even when randomly generated, do not come close to being able to 
describe arbitrary surfaces); the degree of variance and covariance 
within and between bins of D2{r). Whilst all these factors may po- 
tentially be limiting in the correct circumstance, the last is worthy 
of some further discussion in particular as it is one aspect in which 
D2{r) may differ significantly from Di(r). 

Evidence of overfitting is generally seen at small scales, and it 
has been seen that on large scales Di ~ even for drastic overfits 
(e.g. Figure |5j. At these scales the quoted errors upon D2{r) will 
still be large when compared to those for D\{r), because of the 
typically greater values of e as compared to e — e^, meaning that 
contributions to (d^) on these scales are overly suppressed. More- 
over, the use of (d^) as a means of ranking competing models will 
work best in situations where the covariance matrix is totally diag- 
onal, and will correspond in such cases to a chi-squared-like mea- 
sure of goodness of fit to Di = 0. Conversely, strong off-diagonal 
values might produce exactly the results seen in Figures|7]&[8]for 
the D2{r) diagnostics. If false, the assumption of weakly corre- 
lated uncertainties between bins, which is implicit for the utility of 
(di ) and | (di) |, will lend undue weight to the supposed success of 
Di = on large scales and therefore undue credence to the overfit- 
ting model itself. If the covariance matrix for D2{r) contains sig- 
nificant off-diagonal terms, even in the case of a successful model. 



this may be cause for the systematic preference towards overfitting 
models seen in Figures|7]&[8] 

In that case, however, it must also be seen if there is a system- 
atic reason why the covariances of Di (r) and D2 (r ) differ in their 
diagonality. One important difference is apparent from the very def- 
initions given in equations l |13t and l ll4t : the first diagnostic is the 
residual-residual autocorrelation, whilst the second is a measure 
of the data-residual cross-correlation. It may be that D2{r) will 
therefore suffer from strong covariances even for cases approach- 
ing D2 (r) = 0, due to the physically correlated nature of the data 
e, whereas such correlations in Di (r) will inevitably decrease as 
the model tends towards Di (r) — 0. Another difference may lie in 
the angle averaging, performed over all pairs separated by distance 
r, implicit in the definition of the correlation functions thus far used 
in this paper (see, e.g., equations |3] and |5). It can be imagined that 
D2{r), being more strongly dependent upon the non-isotropic field 
et, might suffer from greater uncertainty and even bias due to the 
angle averaging that is implicit in the way Di{r) and D2{r) are 
calculated. 

These are plausible explanations for the effects seen but, un- 
fortunately, this discussion will remain essentially un-concluded 
without further investigation. Estimating post hoc the covariance 
matrices of correlation functions such as D2 (r) is expensive with 
current computing resources and, as the covariance of D2{r) will 
depend non-trivially upon the starfield et, the calculation would 
need to be made many times within the current simulation frame- 
work. Pessimistically, even with such work we may in fact be learn- 
ing more about the properties of any simulated et than about Di{r). 
Despite these issues regarding D2{r), it has certainly been demon- 
strated that both Di{r) show some potential for diagnosing mod- 
elling quality, and that Di{r) appears outstandingly successful in 
the tests so far conducted. A next step is to enlarge the space of 
potential et fields to include more general, arbitrary structure, and 
to see how Di{r) and D2{r) perform when a perfect fit may no 
longer be possible. 



5 FITTING ARBITRARY SPACIAL PATTERNS 

So far in this paper the testing of the Di (r) and D2 (r ) diagnostics 
has been for cases in which a perfect fit to the data was possible 
at some modelling order. The chipwise fitting made the situation 
somewhat more interesting by balancing this against an increased 
propensity to overfit: for example, whilst it might be necessary to 
fit a fifth order polynomial chipwise to formally recover all aspects 
of an underlying fifth order global et, it may not always be justified 
by the data available. This was reflected in the results of Figure[8] 
which preferred lower order chipwise fits even though such models 
might not be able to perfectly reproduce all aspects of the input 
PSF variation. In this Section we take the analysis one step further 
by introducing more arbitrary spacial patterns, to see whether the 
diagnostics can differentiate between more general variation in the 
degree of spatial structure. 

Once again a Monte Carlo approach is taken and we build 
three suites of A^mc = 500 randomly generated starfields. For 
each field, each component of the underlying et is modelled as a 
Gaussian random field with a specified average power spectrum. 
These can be constructed by summing random phase Fourier modes 
across the 1 deg^ field of view 

JVlimil 

(et)i = ^ CinmCOs{knX + (j>n)cOs{kmy + (l)m), (42) 

n,m — 1 
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where kn ~ nn deg~^, and i = 1, 2 denotes the real and imagi- 
nary parts of et respectively. The value Numit = 300 was chosen to 
ensure that structure was fairly represented well below the typical 
separation for 2500 objects in a 1 deg ^ field (1/50 deg), although 
this slowed the field generation somewhat. and (pm are chosen 
to to be random uniform variables in the range [0, 2tt] and, having 
defined k'^ — + k'^, the coefficients Ci„m are then drawn from 
Gaussian random variables of zero mean and variances satisfying 
the constraint that the ensemble average power spectrum is given 
by a power law 



{CinmCin'^') — Ak 



(43) 



For each of the three Monte Carlo starfield suites, different choices 
were made for the value of the power law slope a: 11/3 and 11/6 
(motivated by the Kolmogorov spectrum for atmospheric turbu- 
lence, e.g. Sasiela 1994, relevant for ground-based lensing), and 
the intermediate value 1 1/4. For each a the normalization A was 
accordingly set so as to give equal power in the lowest order mode, 
specifically setting \/(c^^ = 0.01. This normalization, once stel- 
lar ellipticities were sampled at 2500 randomly-distributed points 
and a Gaussian noise of ajv = 0.015 added to each component 
(c.f. Section lsTt . once more recreated simulated starfields of real- 
istic anisotropy for the CFHTLS-W. 

As in Section [4] these simulation starfields were then fit us- 
ing simple bivariate polynomials of order 1, . . . , 7, globally. The 
Di (r) and D2 (r) statistics were calculated for each fitting order, 
and the {di)^^^, \ (di) |min and P^^^{di) criteria used to select the 
most appropriate fitting model. The results of these fits for Di (r) 
and D2{r) can be seen in Figures[9]&[T0] respectively. 

Most importantly, there are clear differences in the distribution 
of preferred fitting order as a function of power spectrum slope: 
as would be hoped the shallowest ct — 11/6 case systematically 
prefers the highest order fits, suggesting even that higher than sev- 
enth order fits might be necessary in many of the starfields. The 
a — 11/4, 11/3 cases peak lower (except for P^g^{di), which 
performs poorly again), with the steepest power spectrum prefer- 
ring the lowest order fits. The {di)^^^ and | (di) \min measures all 
seem to be able to differentiate well between the degrees of spatial 
structure in these arbitrary underlying models. The tendency for 
I (di) |min to prefer lower order fits than (d^ ) is seen again, as is 
the tendency for D2 (r) to prefer higher order fits relative to Di (r ) . 
It should be noted that this latter behaviour is strikingly modest 
when compared to the results of Section|4l perhaps due to a lesser 
degree of spatial correlation in the randomly-generated underlying 
fit (see discussion in Section |42l l. 

The results of Figures|9]&[T0] when taken alongside those for 
the polynomial starfields of Section |4l give compelling evidence 
for the utility of Di (r) as diagnostic tools for PSF modelling. Here 
we have shown that they are able to discriminate between physical 
systems of varying complexity despite unknown or arbitrary un- 
derlying forms, and in Section|4]it was shown that when the model 
approaches the 'correct' form this too can be identified. From sim- 
ple arguments the diagnostics provide a way of making directed, 
iterative, and possibly automated improvements to both modelling 
accuracy and stability. Whilst the three descriptive statistics sug- 
gested in Section [375] remain imperfect in many aspects, there is 
scope for improving them. All of this offers hope that Di (r) will 
be a useful tool for systematically improving PSF modelling in fu- 
ture weak lensing studies. With this in mind, we now turn to a brief 
discussion of how these diagnostics may be related to requirements 
for cosmic shear measurement accuracy. 
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Figure 9. Number of model schemes selected as best fitting using the 
('^i)min (upper panel), | (di) In^in (middle panel), and P^^'^{di) (lower 
panel) criteria, as a function of the input et power spectrum slope a. 
The total number of fields simulated for each input power spectrum was 
A^MC = 500. 



6 RELATION TO COSMIC SHEAR PREDICTIONS 

The results presented so far in this paper compare various fitting 
models, but no attempt has been made to propagate these results 
through to the impact upon cosmic shear measurements. In this 
Section we describe relevant results in the literature, and show how 
these allow values of Di (r) in particular to be related to the im- 
pact upon cosmic shear systematics. This will be necessary to help 
define the limits and requirements for the quality of PSF modelling 
for future surveys. D2{r) is less useful in this respect as it cross- 
correlates the modelling residuals with the telescope PSF e varia- 
tion, a filtering that on average will be unmatched to any cosmic 
signal. 

In what follows s i gnific ant use is made of results from 
IPaulin-Henriksson et al.l ( 2008h . adapted so to propagate the cor- 
related effects of poor spatial modelling of the PSF variation into 
an impact upon measurements of cosmic shear These authors use 
a simple description based upon unweighted image moments to ar- 
gue that, to first order in the quantities concerned, the systematic 
error Se"^^ upon a measured galaxy ellipticity will be given by 



5e' 



(figal — epsF, 



^fipSF 



(44) 
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Figure 10. Number of model schemes selected as best fitting using the 
(•^Dmin ('JPP^'^ panel), I (1^2) Imin (middle panel), and P^^^{d2) (lower 
panel) criteria, as a function of the input et power spectrum slope a. 
The total number of fields simulated for each input power spectrum was 
AfMC = 500. 



where egai and epsF are the ellipticity of the galaxy and the PSF in 
this region of sky, 7?gai and Rpsf are the respective angular radii. 
SRpsp and SepsF represent uncertainties in these aspects of the PSF 
due to poor or unstable modelling. Changing to the notation of Sec- 
tion [2] Sepsp can be immediately identified with the modelling in- 
accuracy m. Considering only the second term (as the Di diagnos- 
tics cannot directly shed light on Slipup) we may write the associ- 
ated error in the measured shear due to the poorly modelled PSF 
anisotropy as 



RpSF 



(45) 



as in 

7 ^ 



having defined the shear susceptibility factor P~' 
iPaulin-Henriksson et al. I( l2008l) (these authors take a value of P 
1.84 as typical for a distribution of measured galaxy ellipticities). 
The measured shear for a region of sky may be then written as 
7 = 7t + Sj"^" + Sj^, where 7t is the 'true' shear and ^7^ is a 
random, assumed-unbiased noise term. Using the expression ^ to 
define a shear correlation and taking equation i45l together with 
the inequality ( I33I ). we may approximate the systematic impact due 
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Figure 11. Di (r) fitting results for the second order (squares) and third 
order (triangles) chipwise fits to the starfield of Section |3](reproduced from 
Figure|5). Overlaid with the dashed line is the tolerance on Di (r) that must 
be met to ensure that systematic errors due to PSF variation to contribute 
less than 5% of the ACDM cosmological signal for source galaxies at 
redshift z = 0.7, calculated as described in Section|6] 



to a poorly modelled PSF anisotropy as 



{m*m) (r) 



(P7)2 



Rps,p 

Rga\ 



Dijr) / [RpsfV 

{p^Y \ \ i?g, ) 



(46) 
(47) 

(48) 



Following the arguments of Section |241 the last expression will 
tend towards an identity for underfitting models but place an upper 
limit on the systematic contribution from m in the overfitting case. 

The expression ( I48t can also be reversed to define a tol- 
erance limit upon -Di(r) for a given 'acceptable' level of sys- 
tematics when fitting m odels to the spatial variation of the PSF. 
IPaulin-Henriksson et al. argue that the steepness of the 

galaxy size distribution justifies the approximation that, typically, 



Rpsv 



(1/1.5)* 



(49) 



together with P^' ~ 1.84 this allows limits to be placed upon ac- 
ceptable values of Di(r), relative to the expected cosmological 
signal £,\{r). Using these values. Figure [TT] shows the tolerance 
upon Di (r) that must be met to ensure that anisotropy systemat- 
ics do not exceed 5% of the ACDM cosmological signal on any 
scale for a population of lensing source galaxies aX. z — 0.7 (a typ- 
ical median value for current and planned wide-a rea lensing sur- 
veys such as CFHTLS-W and Pa n-STARRS: e.g. IFu et alj[2003; 
iHoekstra et al.''200fr; 'Kaiser '2004*). This prediction was calculated 
using the Smith et al. (2003) non-linear matter power spectrum for 
a flat, ACDM cosmology with = 0.25, = 0.75, h = 0.7, 
power-law spectral index n,, — 1 and normalization as — 0.8. 
Over-plotted are the second and third order chipwise fitting results 
of Section|3]and Figure[5] 

The arguments presented here are relatively simplistic, and the 
impact of residual PSF anisotropy variation upon shear estimation 
will also generally depend upon the correction method used. How- 
ever, the approximate tolerance limits they place upon the quality 
of PSF modelling are a useful first guide, and the accuracy of these 
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limits could easily be improved further by simulating erroneous 
PSF correction through any given shear measurement pipeline. 



7 GENERAL MODELLING APPLICATIONS 

A final question is whether the technique can be extended into 
spheres beyond the characterization of anisotropic PSFs for weak 
lensing. Another possible application of the analysis, also within 
the weak lensing context, is illustrated by Figure [12] Here a 
shapelet model of a galaxy, imaged in t he F606W band of the Hub- 
ble Space Telescope Ultra Deep Field jSeckwith et alj|2006h . can 
be seen compared to the original image and to the map of resid- 
uals between the two; the shapelet decompos ition was performed 
using the publicly available code described 

bylM assev & RefregieJ 
(l2005h . The residual map clearly shows correlated features but, de- 
spite this, the galaxy image passed through the automated, iterative 
fitting routine o f lMassev & Refregied ( I2005h without any error er- 
ror flags. Moreover, the model passed with a reasonable reduced 
chi-squared of = 1.02087 as compared to a good fit-expected 
value of — 1 ± 0.0165, calculated via the 3657 degrees of 
freedom in the model (number of pixels minus number of model 
coefficients). In fact, this low value of chi-squared may be sensi- 
tive to an erroneous overestimation in the automatic calculation of 
the background sky noise used by the shapelet software, but this 
simply highlights an important limitation of such techniques when 
the uncertainty in the data is but imperfectly known. The diagnostic 
tools presented here are less sensitive to ignorance about the uncer- 
tainty on individual measurements, and may be used even in the 
total absence of such knowledge. 

Thus motivated, a simple generalization of the technique into 
other potential applications is now discussed in brief. The correla- 
tion of two complex functions / and g in an n-dimensional space 
can be defined as 

[f*g]{r)^ f f*{x)g{x + r)d"x. (50) 

Jr 

In the case of the correlation functions defined in Section lZT] / and 
g would be formed from simple combinations of the etan and ex 
quantities defined by equation Once more, a general function 
g{x) may be defined as a field that describes a number of discrete, 
noisy samplings of an underlying 'true' field gt{x); this quasi-field 
represents the measurements: 

g{x)^gt{x) + N{x). (51) 

As before, A'^ will be assumed to be stochastic noise. Similarly, a 
model fit to these measurements can be written as 

gm{x) = gt{x) +m{x,gt,N- fn^) (52) 

where m is again the inaccuracy and /m labels the modelling 
scheme chosen to represent g{x). 

As in Section im a starting point is to then make simple as- 
sumptions about the properties of the noise for the data being con- 
sidered, namely that 

[N*N]{r)= I N*{x)N{x + r)d"x = (53) 
and 

[g,*N + N*gt]{r) = 0. (54) 

Given these assumptions, and in direct analogy to the functions de- 
fined in Section l22l the following two diagnostic functions can be 



defined in general: 

Di(r) = [(<? - ffm) * (5 - ffm)] (r) (55) 

= [m-km](r) ~ [m-k N + N -km]{r), (56) 

D2(r) = [g*(g- gm) + {g - 5m) * g] (r) (57) 
= - [m ★ + fft * m] (r) 

-[m*iV + 7V*m](r). (58) 

These functions should tend to zero for all vectors r if the model 
fit employed is both stable and accurate, and will have predictable 
behaviour for over- and underfitting models in a way directly anal- 
ogous to the results of Sections [2.3l & |2.4l If the noise assumptions 
of equations ([53} and l |54| ( are broken then the functions above will 
be expected to be non-zero, but tending towards a form that may be 
derived from observations of pure noise. 

The use of -01(7") and D2{r) may provide valuable clues to 
ways in which the modelling of galaxy images such as Figure [T2l 
may be improved; one immediate example might be in the quan- 
tified selection of a more appropriate basis set than the Gauss- 
Hermite polynomials used by the shapelet method ( iNgan et al] 
[2008). However, there is no reason why the method must be re- 
stricted to weak lensing analyses, as the quantification and sup- 
pression of correlated residuals is surely a justified concern wher- 
ever physical data is being described by a best fitting model. Cor- 
relations in residuals between neighbouring data points should be 
examined wherever best-fitting models are being used to represent 
physical data, including but not limited to CMB temperature power 
spectra, supernova distance-redshift curves, or determinations of 
the matter power spectrum from galaxy clustering. However, such 
matters are clearly a topic for further investigation; the overall find- 
ings and results of this work will now be discussed. 



8 CONCLUSIONS 

A simple theoretical framework for an interpretation of statisti- 
cally correlated modelling residuals has been presented, and de- 
veloped into a pair of independent two-point correlation function 
diagnostics for the specific application of PSF modelling in weak 
lensing. The Di (r) and D2 (r) diagnostic functions have been de- 
fined as the autocorrelation in residuals and the data-residual cross- 
correlation, which should tend to zero on all scales for good mod- 
els. Visual inspection of these functions has been shown to give 
sensitive insight into model selection for a single simulated starfield 
designed to mimic typical telescope PSF patterns. This visual in- 
spection of the diagnostics not only quantifies the success of a given 
best fit model, but also provides improvements to be made in a di- 
rected fashion via the simple distinguishing features common to 
both underfitting (Di (r) expected to be both positive and negative 
as a function of r) and overfitting models (Di{r) negative only, 
most likely on small scales). 

The analysis was then extended to a large suite of randomly 
generated starfields, and simple quantifications of the extent to 
which Di{r) = were shown to be able to select appropriate 
orders of modelling complexity very successfully. The results for 
D2{r) were less successful at this stage, but this may be partly ex- 
plained by the greater covariance expected between neighbouring 
bins for this correlation function. For the arbitrary et fields of Sec- 
tion[5]the two performed more similarly, although with D2{r) still 
preferring slightly more complex fitting surfaces. Nonetheless, the 
success of the Di (r) diagnostic in particular provides strong evi- 
dence that the method may be used as a means of improving mod- 
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Figure 12. Galaxy image, model and residuals map for a galaxy fro m the F606W filter Hubble Ultra Deep Field te eckwith et al.ll2006l) . The image model 
and plots were made using the publicly-available Shapelets software iMassev & R efregier 2005). In this example the galaxy shape model passes through the 
automatic shapelet modelling process with no error flags, and a reasonable reduced of 1 02087 (this is calculated using the individual pixel values, treating 
each pixel as independent and giving a total number of degrees of freedom of 3657). 



els of the spatial variation of the PSF in weak lensing analyses. The 
diagnostics successfully differentiate the stability and accuracy of 
competing models in cases where the chosen fitting basis both can 
and cannot fully reproduce the underlying et. In cases where vi- 
sual inspection and full covariance calculations are possible when 
making modelling choices, i.e. if D\ (r) and D2 ir) need not be cal- 
culated many times, both diagnostics may give useful insight into 
the properties of the model and scales on which the fit can be given 
greater or lesser freedom. 

Such properties are clearly of great relevance to the analysis of 
forthcoming weak lensing surveys, and so simple arguments were 
given for how to relate the D\ (r) diagnostic to precision require- 
ments for the fundamental cosmic shear observable £,\(r-). Finally, 
the start of a generalization of the technique into more wider mod- 
elling applications was discussed. This simple work gives hope that 
the technique might be of use in spheres outside weak lensing. 

However, the description presented is basic, largely empirical, 
and leaves many questions unanswered. Placing the analysis of the 
D\ and D2 diagnostics on a truly statistical level, by truly estimat- 
ing the probability of a given Di (r) in the null hypothesis of a well- 
chosen model, would be a topic for useful future investigation. The 
KS test analysis attempted in this respect clearly falls somewhere 
short of this aim. Such analysis may inevitably be much restricted 
by assumptions that will need to be made about the properties of the 
noise upon data and, perhaps less tractably, the properties of the 
underlying physical reality gt{x) that is being modelled. Despite 
these difficulties, such work might allow the technique to do more 
than simply rank competing models by quantifying agreement with 
Di{r) = 0, which is where the theory currently stands. 

If a more complete statistical description can be achieved, the 
analysis of correlated modelling residuals may form an extremely 
useful addition to existing model selection criteria based upon cal- 
culations of the statistical likelihood, such as the AIC, BIC and 
Bayesian evidence. The technique requires little prior knowledge of 
the uncertainties upon individual data points, except for the simpli- 
fying approximation that they are random and uncorrelated, making 
it potentially more robust that chi-squared when uncertainties are 
difficult to estimate. Finally, the greatest interest in the technique 



may lie in where it most differs from standard model selection cri- 
teria: the fact that it takes the relative locations of data points, as 
well as their data values, directly into account. This information is 
valuable, and efficient ways to make use of it may be possible. 
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